<html>
<head>
<title>query performance improvements</title>
<link href="styles/doc.css" rel="stylesheet" type="text/css">
<style>
</style>
</head>
<body>

<h1>query performance improvements in Jena 2.3</h1>

<p>
The performance of RDQL-style queries on Jena in-memory
models has been improved, in some cases substantially.
The improvements come from two sources.

<h2>literal indexing</h2>

<p>Jena's in-memory graphs are indexed by the values in
subject, predicate, and object positions of statements,
to allow all the statements with a given value in one
of those positions to be found rapidly.
Previous versions of Jena did not index literal objects,
because of data-typing complications - e.g., the two values
<code>"1"^^xsd:integer</code> and <code>"01"^^xsd:integer</code>
are the same, but both are different from <code>"1"^^xsd:string</code>.
This meant that query patterns containing a literal 
and unbound subject and predicate (and, for the same
reason, <code>listStatements()</code> calls with just
a literal) executed slowly.

<p>Jena 2.3 indexes literals by their semantic value, the
value that is used by the <code>sameValueAs</code> method.
Literals may now freely and efficiently be used as the
distinguishing feature of a query triple or a 
<code>listStatements()</code> call.

<p>Because the SPARQL [and RDQL] memory-model query engine
uses the order of the triples in the query to guide its search of 
the model, queries containing triples with strongly distinguishing
literals (those that don't appear much in the model) can usefully
be moved earlier in the query.
  
<h2>improved query engine</h2>

<p>The internals of the memory-model query engine, and some details
of the memory-model itself, have been revised to improve
query performance. (It is possible that in some cases models
will take a little longer to load than before.) 

<p>The revisions primarily strip out redundant operations,
doing more work earlier in the query handling in order to
avoid repeated work later. Because of this moving around in
the code, some error situations may be detected at different
times, or in the worse case, not at all: in particular,
updating a model during a query may not be detected.

<p>Apart from changes to exploit literal indexing, queries
should not need to be changed to exploit the performance
improvements. 

<p>Our local tests have had improvements of the order of
four times faster, but it will be heavily dependant on the
shape of the query.

</body>
</html>
